For my final project, I have used graphs and visualizations generated with data on women in African countries, and recreated depictions of those trends in photographs. I think that using human models and real world objects to exhibit some of the high rates of violence and oppression against women will be a more impactful representation than computer generated graphs alone. When doing research, for this class and others, I have found that it is easy to accidentally dehumanize a situation by focusing on the numbers. My goal with this project is to draw connections between the data and the humans which that data is telling us about.
Throughout this project you will notice that for many of the datasets, I have filtered out data from before the year 2010. This is because I believe that looking at the last 10 years of data will provide the most accurate insight into what the quality of life is like for women in African countries today, while still giving me enough information to work with.
Below I have begun to import CSV and Excel files containing data pertaining to relevant indicators of quality of life for women in African countries, as well as data pertaining to GDP and economic indicators.
%matplotlib inline
import pandas as pd
import numpy as np
from IPython.display import display_html
import matplotlib.pyplot as plt
from IPython.display import Image
# The ratio of females to males who are literate. The ages of those surveyed
# range from 15-24.
literacy_rates = pd.read_csv('../data/ratio_of_young_literate_females_to_males_percent_ages_15_24.csv')
# Women who believe husbands are justified in beating their wife for any of the following 5 reasons:
# arguing with him, burning the food, negelcting the children, going out without telling him, or
# refusing him sex.
justified_violence = pd.read_csv('../data/sg_vaw_reas_zs.csv')
# Proportion of women who have beeen subject to physical or sexual violence in the last 12 months. The percent
# of women ages 15-49.
violence_df = pd.read_csv('../data/sg_vaw_1549_zs.csv')
# OECD data on GDP of countries around the world. Oil rents only goes up to the year 2017.
gdp_df = pd.read_csv('../data/OECDdata.csv')
# World Bank data on the proportion of seats held by women in national parliaments (%).
parliament_df = pd.read_excel(r'../data/women_in_parliament_data.xls')
# Recent data on oil rents as a percent of GDP.
oil_gdp_recent_df = pd.read_csv('../data/recent_oil_rents_of_gdp.csv')
Below is a list of all the countries in Africa. This will be helpful when filtering data out of larger world datasets. I also begin to filter out data from before the year 2010 for the dataframe containing information on violence against women.
African_countries = ['Algeria', 'Angola', 'Benin', 'Botswana', 'Burkina Faso', 'Burundi', 'Cabo Verde', 'Cameroon',
'Central African Republic (CAR)', 'Chad', 'Comoros', 'Congo, Democratic Republic of the',
'Congo, Republic of the', "Cote d'Ivoire", 'Djibouti', 'Egypt', 'Equatorial Guinea', 'Eritrea',
'Eswatini (formerly Swaziland)', 'Ethiopia', 'Gabon', 'Gambia', 'Ghana', 'Guinea', 'Guinea-Bissau',
'Kenya', 'Lesotho', 'Liberia', 'Libya', 'Madagascar', 'Malawi', 'Mali', 'Mauritania', 'Mauritius',
'Morocco', 'Mozambique', 'Namibia', 'Niger', 'Nigeria', 'Rwanda', 'Sao Tome and Principe',
'Senegal', 'Seychelles', 'Sierra Leone', 'Somalia', 'South Africa', 'South Sudan', 'Sudan',
'Tanzania', 'Togo', 'Tunisia', 'Uganda', 'Zambia', 'Zimbabwe']
# Using the last 10 years of data.
for i in range(2000,2010):
del violence_df[str(i)]
# Using only the countries in Africa.
africa_violence = violence_df.loc[violence_df["country"].isin(African_countries)]
Only 29 out of the 54 countries in Africa had data on the percentage of women between the ages of 15 and 49 who had been subject to physical or sexual violence for a 12 month period.
len(africa_violence)
It is important to note that there is a large amount of data missing from various countries and years. This is not because there was no violence, but rather because data relating to this subject matter can be difficult to gather. It is possible that countries with some of the highest rates of violence against women also had high rates of violence overall, which would have made data pertaining to women especially difficult to gather. It is also possible that women in these studies have underreported the rates or severity of violence for fear of social repercussions.
Displayed below is a scatter plot representing the rates with which women had beeen subject to violence in various countries in Africa. The information comes from a dataset containing the proportions of women who have beeen subject to physical or sexual violence in the last 12 months. These values represent the percentage out of total women in the country between the ages of 15 and 49 who were surveyed.
# The code below "melts" the data so that all of the years are in a single column and are easier
# to work with.
melted_violence_df = pd.melt(africa_violence,
["country"],
var_name = "year",
value_name = "rate of violence")
# Drawing a graph and storing the result.
scatter = melted_violence_df.plot.scatter(x='year', y='rate of violence', figsize=(5,8))
scatter.set_ylabel("Violence Rates (% of total women in country ages 15-49)")
scatter.set_xlabel("Year")
The scatter plot shown above is helpful for seeing general trends in rates of women subject to violence on the African continent. To get more detailed information, we'll look at the bar graph shown below which displays the proportion of women subject to violence by country and year.
violence_present_df = melted_violence_df.dropna()
violence_present_df = violence_present_df.astype({"year":str, "country":str})
violence_present_df["Country"] = violence_present_df["country"] + " (" + violence_present_df["year"] + ")"
violence_present_df.plot(x='Country', y='rate of violence', kind='bar', figsize=(15,5),
title="Percent of Women Subject to Violence in the last 12 months")
# Getting data for Burundi.
violence_present_df.loc[violence_present_df['country']=='Burundi']
# Pie chart, where the slices will be ordered and plotted counter-clockwise:
labels = 'Burundi 2017', ' '
sizes = [27.9, 72.1]
explode = (0.1, 0) # only "explode" the 1st slice
fig1, ax1 = plt.subplots()
ax1.pie(sizes, explode=explode, labels=labels, autopct='%1.1f%%',
shadow=True, startangle=90)
ax1.axis('equal') # Equal aspect ratio ensures that pie is drawn as a circle.
plt.show()
The pie chart above represents the 27.9% of women were subject to violence in Burundi in 2017. This is the highest rate of violence for the most recent year we see in our data. To get a better idea of how accurate this data is, we will explore a dataset containing information on the proportion of women who believe their husbands are justified in beating them for one of five reasons: arguing with him, burning the food, negelcting the children, going out without telling him, refusing him sex.
The image displayed below is a recreation of the pie chart using two of my friends who were willing to help with the project. I asked them to wear hoodies and gloves to conceal their identities for the photos; in doing this I was trying not to make the stories of other women my own. My friends and I do not look like the women who were surveyed to get this data, nor have we had similar experiences to theirs. It was important that this recreation promoted learning from their stories while not attempting to tell them myself.
Image(filename = "../images/IMG_1106 2.JPG", width = 300, height = 300)
# Tidying the dataframe of justified violence.
justified_violence = pd.melt(justified_violence,
["country"],
var_name = "year",
value_name = "rate of violence")
justified_violence = justified_violence.dropna()
# Scatter plot. Drawing a graph and storing the result.
scatter = justified_violence.plot.scatter(x='year', y='rate of violence', figsize=(15,8), ylim=(0.0, 100.0))
scatter.set_ylabel("Percent of women who believe husbands are justifed in beating them")
scatter.set_xlabel("year")
In the scatter plot displayed above is the percent of women who believe their husbands are justified in beating them for one of the five reasons previously mentioned. I thought it was significant to include data from before 2010 to demonstrate how this trend has not changed over time, as some other trends have. I predict that countries in which this indicator was high in 1999, are countries that still have high rates of both this indicator and overall violence against women.
# Bar graph.
# Using the last 10 years of data.
for index, row in justified_violence.iterrows():
if int(row['year']) < 2015:
justified_violence = justified_violence.drop(index)
justified_violence = justified_violence.astype({"year":str, "country":str})
justified_violence["Country + Year"] = justified_violence["country"] + " (" + justified_violence["year"] + ")"
justified_violence.plot(x='Country + Year', y='rate of violence', kind='bar', figsize=(15,5), ylim=(0.0, 100.0),
title="Percent of women who believe husbands are justifed in beating them")
The chart displayed above is some of the most recent data broken down by country and year. This is so that we can get a better idea of African women's feelings toward domestic violence in the present day. Shown below is a recreation of this graph using burnt pasta; one of the five reasons which a woman might think her husband is justified in beating her for (buring food).
Image(filename = "../images/IMG_1150 2.JPG", width = 850, height = 400)
# Getting data for Burundi
justified_violence.loc[justified_violence['country']=='Burundi']
# Pie chart, where the slices will be ordered and plotted counter-clockwise:
labels = 'Burundi 2017', ' '
sizes = [61.8, 38.2]
explode = (0.1, 0) # only "explode" the 1st slice
fig1, ax1 = plt.subplots()
ax1.pie(sizes, explode=explode, labels=labels, autopct='%1.1f%%',
shadow=True, startangle=90)
ax1.axis('equal') # Equal aspect ratio ensures that pie is drawn as a circle.
plt.show()
The pie chart above represents the 61.8% of women in Burundi in 2017 who believed their husbands were justified in beating them for either arguing with him, burning the food, negelcting the children, going out without telling him, or refusing him sex. This is significantly different from the 27.9% of women who had reported experiencing physical or sexual violence in that same year. This leads me to believe that there was underreporting in the dataset with information on the percentage of women who had been subject to physical or sexual violence within the last 12 months. Below is the recreation.
Image(filename = "../images/IMG_1110 2.JPG", width = 300, height = 300)
literacy_rates = pd.melt(literacy_rates,
["country"],
var_name = "year",
value_name = "literacy rate")
literacy_rates = literacy_rates.dropna()
# Using the most recent 10 years of data.
for index, row in literacy_rates.iterrows():
if int(row['year']) < 2000:
literacy_rates = literacy_rates.drop(index)
for index, row in literacy_rates.iterrows():
if row['country'] not in African_countries:
literacy_rates = literacy_rates.drop(index)
# Gets the 10 lowest literacy rates from the year 2000-2010.
lowest_lit_df = literacy_rates.nsmallest(10, ['literacy rate'])
lowest_lit_df = lowest_lit_df.astype({"year":str, "country":str})
lowest_lit_df["Country + Year"] = lowest_lit_df["country"] + " (" + lowest_lit_df["year"] + ")"
lowest_lit_df.plot(x='Country + Year', y='literacy rate', kind='bar', figsize=(15,5),ylim=(0.0, 1.5),
title="10 Lowest Literacy Ratios")
Image(filename = "../images/IMG_1159 2.JPG", width = 850, height = 400)
# Gets the 10 highest literacy rates from the year 2000-2010.
highest_lit_df = literacy_rates.nlargest(10, ['literacy rate'])
highest_lit_df = highest_lit_df.astype({"year":str, "country":str})
highest_lit_df["Country + Year"] = highest_lit_df["country"] + " (" + highest_lit_df["year"] + ")"
highest_lit_df.plot(x='Country + Year', y='literacy rate', kind='bar', figsize=(15,5),ylim=(0.0, 1.5),
title="10 Highest Literacy Ratios")
Image(filename = "../images/IMG_1162 2.JPG", width = 850, height = 400)
In this section I am going to use a dataset with information on the proportion of women who have been subject to violence and merge this information into a dataframe that contains values representing the percentage of a country's GDP that comes from oil rents. I predict that there will be a correlation between countries with large amounts of oil revenues and rates of women subject to violence.
# Tidying up the GDP data so that it is easier to look at various indicators. Examples of indicators in this dataset
# include oil rents, total natural resource rents, and mineral rents as a percent of each country's total GDP.
gdp_df = gdp_df.drop(["COUNTRY", "INDICATOR", "TABLE", "YEAR", "Flag Codes", "Flags"], axis = 1)
# Using only the countries in Africa.
africa_gdp_df = gdp_df.loc[gdp_df["Country"].isin(African_countries)]
# Isolating the oil rents indicator so that it is the only one in the current dataframe.
oil_rents_df = africa_gdp_df.loc[gdp_df["Indicator"]== "Oil rents (% of GDP)"]
# Changing column names to accurately reflect its values and dropping unnecessary columns.
oil_rents_df = oil_rents_df.rename(columns={"Value":"Oil rents (% of GDP)"})
oil_rents_df = oil_rents_df.drop(["Indicator", "Table name"], axis = 1)
# Tidying the rates of violence data so that we can more easily merge it.
melted_violence_df = melted_violence_df.rename(columns={"country":"Country", "year":"Year"})
# Combining datasets to see if there is a realtionship between oil rents and rates of violence.
oil_to_violence_df = melted_violence_df.merge(oil_rents_df, on=["Country", "Year"], how="inner")
# dropping rows with nans
oil_to_violence_df = oil_to_violence_df.dropna()
# Preparing to display the two dataframes side by side.
df1 = oil_to_violence_df.sort_values(ascending=False, by=["Oil rents (% of GDP)"])
df2 = oil_rents_df.sort_values(ascending=False, by=["Oil rents (% of GDP)"])[:13]
df1_styler = df1.style.set_table_attributes("style='display:inline'").set_caption('Oil and Violence')
df2_styler = df2.style.set_table_attributes("style='display:inline'").set_caption('Oil Rents')
display_html(df1_styler._repr_html_() + " . . . ." + df2_styler._repr_html_(), raw=True)
LEFT: Countries that contained both significant rates of physical or sexual violence against women and oil rents as a percent of its GDP, sorted in decreasing order of the greatest percent of oil rents. If a country did not have data on both the rates of violence and oil rents, then it was not included in this table.
RIGHT: Countries in the dataframe with oil rents as a percent of GDP, sorted in decreasing order of the greatest percent of oil rents.
CONCLUSIONS: Looking at the Oil and Violence table on the left, it is clear that there is not significant data to prove a correlation between rates of violence against women and countries with a high percent of their GDP coming from oil revenues (only two countries in the merged dataframe!).
# Adding the proper column headings to the women in parliament dataset.
col_list = parliament_df.iloc[2]
for i in range(len(col_list)):
if(type(col_list[i]) == np.float64):
col_list[i] = int(col_list[i])
parliament_df.columns = col_list
# Dropping the first three rows which don't contain data.
parliament_df = parliament_df.iloc[3:]
parliament_df = parliament_df.reset_index()
parliament_df = parliament_df.drop(['Country Code', 'Indicator Name', 'Indicator Code','index'],axis = 1)
# Using only the countries in Africa.
parliament_df = parliament_df.loc[parliament_df["Country Name"].isin(African_countries)]
# Melting the dataset to have a years column.
melted_parliament_df = pd.melt(parliament_df,
["Country Name"],
var_name = "Year",
value_name = "Women in Parliament (%)")
par_by_country_df = melted_parliament_df.groupby('Country Name')['Women in Parliament (%)']
# Dropping rows without values.
parliament_present_df = melted_parliament_df.dropna()
# Using the last 10 years of data.
for index, row in parliament_present_df.iterrows():
if row['Year'] < 2010:
parliament_present_df = parliament_present_df.drop(index)
parliament_present_df = parliament_present_df.astype({"Year":str, "Country Name":str})
parliament_present_df["Country"] = parliament_present_df["Country Name"] + " (" + parliament_present_df["Year"] + ")"
parliament_present_df.plot(x='Country Name', y='Women in Parliament (%)', kind='scatter', figsize=(15,5),rot=90)
The graph above represents the proportion of seats held by women in national parliaments. Each country has a point representing the data available for every year since 2010. Many countries are missing data for various years or only have data for a few of the most recent years.
par_vi_df = violence_present_df.merge(parliament_present_df, on=["Country"], how="inner")
par_vi_df.plot.scatter(x="Women in Parliament (%)", y="rate of violence", alpha=.5)
The scatter plot above shows the percent of a country's parliament which is female plotted against the rate of violence in that country, for all the countries we have data for since 2010. The rate of violence indicator comes from the dataset of women who had been subject to physical or sexual violence within the last 12 months. My expectation before seeing the results of this graph were that there would be an inverse relationship between the two variables; meaning that as the rate of women in parliament increased, the rate of violence in the country would decrease. Although that may be the trend on the right half of the graph (after about 30% of women in parliament), there appears to be no relationship between the two variables up until that point. Had we used a larger dataset, we might have been able to see more of a correlation.
Below is the recreation of the scatter plot. The graph was painted on a door found in the ally on the side of my house. My initial hypothesis of violence against women decreasing as the rate of women in parliament increased relected the possibility of more women in positions of power due to increased respect for women in a country. Initially I thought the door was a nice place for this graph, as it could represent a metaphorical door leading to greater opportunities for women.
Image(filename = "../images/IMG_1130 3.JPG", width = 500, height = 300)